The Structured Information Manager: A Database System for SGML Documents

نویسنده

  • Ron Sacks-Davis
چکیده

One of the important standards for document interchange and representation that has emerged is SGML, the Standard Generalized Markup Language. SGML is designed to capture the logical structure of documents, i.e. the logical components such as titles and paragraphs and their interrelationships. SGML is a complex standard, and the design of a database system for managing SGML documents poses many challenges. In this talk, we describe an SGML conformant database system, called the Structured Information Manager (SIM), and illustrate how the support of document structure can help in many important applications by describing how SIM has been deployed to provide public access to databases of legislation. The Structured Information Manager (SIM) is a document database system designed to manage multigigabyte collections of documents containing unstructured text (ASCII), structured text (including SGML and MARC), binary objects (such as images and videos) and other kinds of data. As an information retrieval system, SIM provides a client-server model of processing and supports a wide range of user interface platforms, including command line, MSWindows, Macintosh, and X. SIM uses compressed. inverted file technology for accessing large text collections using both query and browsing paradigms [ZobMof92]. Both Boolean and natural language queries are supported and response times are sub-second, even for multigigabyte databases. SIM is standards based. It provides direct support for SGML, the international standard for document representation and interchange and 239.50, the international standard for client server communication in an information retrieval applications [SacArn95]. For Web access, an HTTP to 239.50 translation is supported. By directly supporting SGML, documents of arbitrary complexity can be supported by SIM and a collection of documents can be treated as a database of information. SIM is supported and marketed in Australia and New Zealand by Ferntree Computer Corporation. Research and

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Docbase - a Database Environment for Structured Documents

Standard Generalized Markup Language (SGML) has been widely accepted as a standard for document representation. The strength of SGML lies in the fact that it embeds logical structural information in documents while preserving a human-readable form. This structural information in SGML documents allows processing of these documents using database techniques. SGML facilitates this goal by providin...

متن کامل

Extending SGML to Accommodate Database Functions: A Methodological Overview

* Partially supported by US Dept. of Education award number P200A502367 and NSF Research and Infrastructure grant, award number NSF CDA-9303189. Abstract A method for augmenting an SGML document repository with database functionality is presented. SGML [ISO 8879, 1986] has been widely accepted as a standard language for writing text with added structural information that gives the text greater ...

متن کامل

Standardizing the Querying Process with SGML The SQL DTD

One of the most exciting applications of SGML which has emerged in the recent years is its use in document databases. The structural information embedded in SGML documents makes it possible to query SGML documents and extract information in an automatic manner; however, this querying process has not been standardized. As a result, different SGML database implementations use their own query lang...

متن کامل

Database Systems for Structured Documents

Documents stored in a database system can have complex internal structure described by languages such as SGML. How to take advantage of this structure presents challenges for database system implementors. We classify the types of queries that need to be supported by SGML-conformant database systems. We then describe several data models that have been proposed for representing documents in a dat...

متن کامل

Toward the Union of Databases and Document Management: The Design of DocBase

With the advent of the World Wide Web (WWW) and the increased use of electronic documents in almost all aspects of computing, the problems of management of and systematic information retrieval from electronic documents have become highly pertinent. Information retrieval (IR) techniques allow us to retrieve documents based on keywords, but often these searches are not powerful enough to accurate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996